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Finding Dantzig selectors with a proximity operator based 

fixed-point algorithm * 

Ashley Prater^ Lixin Shen^ Bruce W. Suterl 


Abstract 

In this paper, we study a simple iterative method for finding the Dantzig selector, which 
was designed for linear regression problems. The method consists of two main stages. The first 
stage is to approximate the Dantzig selector through a fixed-point formulation of solutions to 
the Dantzig selector problem. The second stage is to construct a new estimator by regressing 
data onto the support of the approximated Dantzig selector. We compare our method to an 
alternating direction method, and present the results of numerical simulations using both the 
proposed method and the alternating direction method on synthetic and real data sets. The 
numerical simulations demonstrate that the two methods produce results of similar quality, 
however the proposed method tends to be significantly faster. 

Key Words: Dantzig selector, proximity operator, fixed-point algorithm, alternating direction 
method 

1 Introduction 

This paper considers the problem of estimating a vector of parameter /3 E from the linear 
problem 

y = XI3 + z, ( 1 ) 

where y E M” is a vector of observations, X an n x p predictor matrix, and z a vector of independent 
normal random variables. The goal is to find a relevant parametric vector j3* E among many 
potential candidates and obtain high prediction accuracy. 

The penalized least squares estimator for problem ([1]) has been the focus of a great deal 
of attention for variable selection and estimation in high-dimensional linear regression when the 
number of variables is much larger than the sample size |10l [20l [23l [25l [26l I29j . Recently the 
Dantzig selector was proposed for problem ([T|) in [6]. The Dantzig selector /3 E is a solution to 
the optimization problem 


^ E argmin{||/3||i : ||D ^X~^{Xp - y)\\oo < S}, (2) 

with a fixed parameter <5 > 0 and a diagonal matrix D where the diagonal entries are equal to 
the (.2 norm of the columns of X. Here, we write ||x||q for the Iq norm of x E 1 < q < oo. 

Optimal £2 rate properties for ||/3 — /?*||2 were established under a sparsity scenario and impressive 
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empirical performance on real world problems involving large values of p was shown in [6]. Since 
then the Dantzig selector has received a considerable amount of attention. Discussions on the 
Dantzig selector can be found in [3l [5l [71 |TT1 |T3l [211 IMl- In [IS], an algorithm was proposed for 
fitting the entire coefficient path of the Dantzig selector with a similar computational cost to the 
least angle algorithm that is used to compute the ii minimization via the LASSO technique. The 
Dantzig selector is a convex, but not strictly convex, optimization problem. Unique solutions are in 
general not guaranteed. Conditions ensuring the uniqueness of the Dantzig selector were presented 
in [9]. In |17j a new class of Dantzig selectors for linear regression problems for right-censored 
outcomes was proposed. 

The importance of the Dantzig selector in linear regressions has been demonstrated in the afore¬ 
mentioned work. Efficient methods for solving problem ([2|), which however were not emphasized 
in the current literature, are highly needed. In [6], the problem is cast as a linear program which 
is solved by using a primal-dual interior point algorithm [3|. As it is well known, interior point 
methods are not efficient for large-scale problems. In [2], the problem is cast as linear cone pro¬ 
gramming problem for which a smooth approximation to its dual problem is solved by an optimal 
first-order method mm- Recently, an alternating direction method (ADM) for finding the Dantzig 
selector was studied in |18j . Numerical experiments showed that this method usually outperforms 
the method in [2] in terms of CPU time while producing solutions of comparable quality. The 
problem was rewritten in m in a form to which ADM can be easily applied. ADM itself is an 
iterative algorithm. In each iterate, two subproblems are needed to be solved successively. One of 
the subproblems has a closed form solution, while the other does not and is approximated by a 
nonmonotone gradient method proposed in [T9]. To alleviate the difficulty caused by the subprob¬ 
lem without a closed form solution, a linearized ADM was proposed for the Dantzig selector and 
was shown to be efficient for solving both synthetic and real world data sets in |28| . 

In this paper, the Dantzig selectors for problem ([2|) are found by an algorithm based upon 
proximity operators. We first rewrite the problem as an unconstrained structural optimization 
problem via an indicator function. The resulting problem is then solved by a primal-dual algorithm. 
In comparison with the one given in |18j . our proposed algorithm is easy to implement. Ours 
achieves comparable quality results while consuming much less CPU time. 

The outline of the paper is organized as follows. In Section [2] we present our fixed-point theory 
based proximity operator algorithm for solving problem ([2|). In Section [3l we present numerical 
experiments comparing the accuracy and efficiency of the proposed algorithm with ADM proposed 
in |18j . The first set of experiments uses simulated sparse signals and the second set uses samples 
of biomarker data to predict the diagnosis of leukemia patients. Section [H concludes the paper. 

The following notation will be used in the rest of the paper. For any vector u E let Ui and 
u{i) both denote the z-th component of u. Also for any vector u E |tt| is the component-wise 
absolute values of u, that is the i-th component of |u| is \ui\, while sign(u) is the vector whose z-th 
component is 1 if zzj > 0 and —1 otherwise. Given two vectors u and v in x o y denotes the 
Hadamard (component-wise) product of u and v, max{zz, n} denotes the vector whose z-th entry 
is niax{ui,Vi}, and min{n, u} denotes the vector whose z-th entry is min{zzj, Uj}. Let 1 denote the 
vector of all ones whose dimension shonld be clear from the context. 

The natural numbers are given by N. For the usual d-dimensional Euclidean space denoted by 

we define {x,y) := x,y ^ M'^, the standard inner product in We denote by 

II • 111, II • II 2 , and II • I loo the £i norm, £2 norm, and the ioc norm of a vector, respectively. The class 
of all lower semicontinuous convex functions / : —>■ (—oo,-|-oo] such that dom/ := {x E : 
f{x) < -boo} 7 ^: 0 is denoted by ro(M'^). For a closed convex set C of its indicator function lq 
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is in Fi 


and is defined as 


ic{u) := 


0 , 


if M G C, 


+ 00 , otherwise. 


For a function / G ro(K‘^), aigmin^^Q f (x) is the set of points of the given argument in C for which 
/ attains its minimum value, i.e., f (x) = {x € C : f{y) > f{x) for all y G C}. 

2 The Dantzig Selector with Proximity Algorithms 

In this section, we develop a proximity algorithm for solving the optimization problem ([2]). We 
begin with reviewing two existing works on this problem, namely the alternating direction method 
(ADM) proposed in [18] and the linearized alternating direction method of multipliers (LADM) 
proposed in 
follows: 


Both methods work on the reformulated optimization problem ([2]) with D = I as 

,{||/?||i:XT(X/?-y)=r}, (3) 


mm 

,rG{T:||T||oo<i5} ' 

where r G is an auxiliary variable. The augmented Lagrangian function for problem ([3|) is 

Lcil3,T,j) := ||/3||i + - y) - t) + ^\\X^{XP - y) - t\\1, 

where 7 G is the Lagrange multiplier and c > 0 is a penalty parameter. 

The iterative scheme of ADM for optimization problem ([3|) is 


pk+i 


r ~' - ^ argmin^g|^,||.^||^< 5 }Lc(/3 ,t,7 ), 


k 

argmin^gjjpLc(/3, 7 *=), 

7^+1 = 7^ + c(A^(A:/ 3 ^+i -y)- r^+i). 


which, with some elementary manipulations, can equivalently be written as 


^k+l 

pk+1 


argmin.^g|.^,|7ll^<5}||r - {X~^{XI3^ - v) + 
argmin^gRp{||/3||i + f ||A’^(A/3 - y) - 


_ = 7 ^ + c{X^{X/3^+^ -y)- T^+^). 


(4) 


The r-related subproblem in (|4|) has a closed form solution, but the /3-related subproblem does not 
and is solved approximately by using the nonmonotone gradient method in 
The iterative scheme of LADM for optimization problem ([3|) is 


/3' 


fc + 1 

p/c +1 


argmin^^i^p 


l+c(r;^/3-/3") + f||/3-/3"||i}, 


argmin^g|^,| 7 ll^< 5 |||T - (XT(X/3^+i - y) + 

X _L WT (vak+l _ 


7^+1 = 7 ^ + c(A:4(X/3^+i -y)- r^+ 1 ). 


fc|| 2 r 

II2 

'y^ s 


li, 


(5) 


where £ > 0 is a proximal parameter and := X~^ X {X~^ (X —y)—T^ + —). Note that the order 
of updating and in ADM is reversed in LADM. For the /3-subproblem in LADM, the 
last two terms in the objective function can be viewed as the linearization of the quadratic term 
^\\X'^{X(5 — y) — with respect to (3 at (3^ after dropping a constant. Furthermore, the 

/3-subproblem, after completing the square of these two terms and ignoring the resulting constant 
term, is the same as 


^ argmin^eRp{||/3||i + ^||^ - (/3'' - 
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which has a closed form solution. The r-subproblem has a closed form solution as in ADM. 
Therefore, LADM can be easily and efficiently implemented. It was shown in [28] that for any 
c > 0 and t > 2||X''"A||2 and any initial iterate (/3°,r°,7°), the sequence {(/3^,r^,7^) : A: G N} 
converges. Furthermore, the limit of the sequence {(/3^,r^) : A; G N} is a solution of the Dantzig 
selector problem (|3|). 

In the following, we present our fixed-point theory based proximity operator algorithm for 
solving the optimization problem ([2|). For simplicity of exposition, with the matrices X and D, the 
vector ?/, and the constant b appearing in problem ([2|), we set 

A-.= D-^X'^X, b:=D-^X'^y, C := {/3 G : ||/3 - 6||oo <-5}. (6) 

Then the optimization problem ([2|) can be rewritten as 

P G argmin{ 11/3111 + ic{A(3) : /3 G M^}. (7) 

The objective function of this problem is convex and coercive thanks to the ^i-norm being coercive. 
Hence a solution to problem ([7]) exists and can be characterized in terms of proximity operator. To 
this end, we review the definition of proximity operator. 

For a function / G Fo(M'^), the proximity operator of / with parameter A, denoted by proXl^J, 
is a mapping from to itself, defined for a given point x G by 

prox_)^j(x) := argmin + /(^) ■ u 

Now, we can present a characterization of solutions of problem ([7|) that is simply derived from 
Fermat’s rule. 

Theorem 2.1 Let the pxp matrix A and the vector b gMP be given in Q. /f /3 G is a solution 
to problem dZD, then for any a > 0 and A > 0 there exists a vector r G such that 

P = proxi||.|| (8) 

QII 111 \ a ) 

T = (/-prox,^)(A^-hr). (9) 

Conversely, if there exist a > 0 and A > 0 such that /3,r G satisfy equations ([5D and Q, then 
P is a solution of problem dZD- 

Proof: The proof of the result follows straightforwardly a general result in |16l Proposition I]. For 
completeness, we present its proof here. First, we assume that /3 is a solution to problem ([7|). By 
Fermat’s rule and the chain rule of subdifferentiation, 0 G cl|| • ||i(/3) + A~^dLc{Ap). Then for any 
a > 0 and A > 0 there exists r G jdLc{AP) such that —^A~^t G 9 (^|| • ||i) (/3), that is, in terms 
of proximity operator, equation ([8]). Since the set diciAP) is a cone, then r G jdic{Ap) implies 
r G dLc{Ap) which is essentially equivalent to equation Q. 

Conversely, if equations (|8j) and (l9|) are satisfied, we then have —^AJt G 5 (^|| • ||i) (/3) and 
r G dic{Ap) accordingly. Using the fact that the set dic{AP) is a cone again, the second inclusion 
T G diciAp) implies G ^dLc{Ap). Multiplying A"*" to both sides of the previous inclusion 
and using the chain rule dpc o A)(/3) = A~^dic{Ap), we have that ^A''"r G o ^)(/3)- Since 

— ^AJt G i 9(^)(/3), we obtain 0 G cl|| • ||i(/3) + dpc o A)(/3). This shows that /3 is a solution to 
problem ([7]) . □ 
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We comment on the computation of the proximity operators proxj_|| m and prox appearing 

CK INI 1 C 

in equations ([8]) and Q. The proximity operator proxj_|| ,, at any u G MP is the well-known 

all 111 

soft-thresholding operator given as follow: 


proxj_|| 11 (u) = sign(ri) o max 
all 111 



a 


( 10 ) 


Lemma 2.2 Let 6 be a constant, let b be a vector in and let the set C be given in ([6]). 
for any vector v gMP, 

prox^^(n) = b + min{max{n — 6, —<51}, <51}. 


and 


(/ - prox,^)(u) = prox 5 ||.||^(n - b). 


Then 

( 11 ) 


Proof: It is well-known that the proximity operator prox^^ is the projection operator onto the 
set C. Since the set C is the cube with h as its center and 25 as the length of its side in 
Hence, prox^^(n) the projection of the vector n G is given by m- Further, it holds that 
(/ — prox^^)(n) = {v — b) — min{max{(u — b), —(Jl}, <51}. From this identity, we can directly check 
that for each i from 1 to n 


{v — b)i — min{max{(n — b)i, —(5}, 5} = sign((n — b)i) ■ max{|(u — b)i\ — 5, 0}, 
which, by using equation (fTOjl . is prox^||.||j ((n — b)i). This completes the proof. □ 

As a result of Lemma 12.21 equation Q can be rewritten as follows: 


T = prox5||.||^(A/3-br-6). (12) 

Therefore, by Theorem 12.11 finding a solution /3 to problem ([7]) amounts to solving the coupled 
fixed-point equations ([8]) and (fT^ . 

Two iterative schemes can be derived from equations (l8|) and (fT2]) . Let us write equation Q as 
/3 = proxj_|| i| (/3 — ^A''~(2r — r)). With any initial estimates and /3^, the first iterative 

scheme based upon equations (l8|) and (fT^ is as follows: 

I /3"+i = proxi||.||^(/3"-AAT(2T"-r^-i)), 

I ^k+i _ prox5||.||^(A/l^+^ + — b). 

We would like to comment the connection of this scheme with some existing ones. The dual 
formulation of dH), as derived in m , is 

max{-(6,r) -(^||r||i : ||A'^r||oo < !}■ 


Applying the primal-dual hybrid gradient method (see |12l Equation 2.18]) to the above dual 
formulation yields exactly the iterative scheme (jl8ji . It was further pointed out in [8] that the 
iterative scheme ()13p is essentially the same as the linearized ADM applying to problem ([7]). In 
other words, the iterative scheme (|13p is the same as ([5|) in the case of D = /. 

Now, let us introduce the second iterative scheme for problem ([7|). Let us write equation (1120 as 
r = prox^||.||^ (^(2/? — fi) + t — b). With any initial estimates I3~^ = /3° and r®, the second iterative 
scheme based upon equations ([8|) and (fT2]l is as follows: 


r^+i = prox5||.||^ (A(2/3^ - /3^-i) + - 6), 

= proxi ||.|| (/3^ - 

all 111 


(14) 
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The sequence {(/3^,r^) : /c G N} generated by the iterative schemes (I13p and (|14ll will converge 
for any initial seeds when A/a < The proof of this convergence result can be found in 

mm- Hence, the limit of the sequence {(/3^,r^) : A: G N} is a fixed-point of equations ([8]) and Q. 
In particular, the limit of the sequence {/?*': A; G N} is a solution to problem (ffj). 

As noted in [6] , the Dantzig selector often slightly underestimates the true values of the nonzero 
parameters. To correct this bias and increase performance in practical settings, a postprocessing 
procedure was proposed in [6]. Assume that /3°° is the limit of the sequence {/3^ : A: G N} that 
is generated through the iterative scheme (fT4l) . This postprocessing consists of two steps. The 
first step is to estimate A ;= {i : (3f° / 0}, the support of the vector /3°°. Let X\ be the n x |A| 
submatrix obtained by extracting the columns of X corresponding to the indices in A, and let /3 a 
be the |A|-dimensional vector obtained by extracting the coordinates of /3 G corresponding to 
the indices in A. The second step of the postprocessing is to construct the estimator /3 G such 
that 

Pa = argmin{||AA/3 - y \\2 : /3 G 

and set the other coordinates to zero. If the matrix XJ^Xa is invertible then /3 a = {X~^XA)~^X~l^y. 

Putting all above discussion together, a complete two-stage procedure for finding a solution of 
problem ([7]) is described in Algorithm [TJ 


Algorithm 1 (Two-stage scheme for problem ([7])) 

Input: Set the fixed parameters 

y G M”, A G b G 

Initialization: Set the initial parameters 

= 0 , 

Stage-I: Generate the sequence {(r^,/3^) : 
while (stopping criterion not met) do 

rk+i ^ prox5||.||^ (A(2/3^ - P^-^) + - 6), 

pk+i ^ proxi ||.||^(/3^ - 

A; ^ A: -|- 1. 

end while 

Stage-II: Let {t°°,P°°) be the last set of parameters computed in Stage-I. 

• Approximate supp(/3°°) by A = {j : \P°°{j)\ < tol}. 

• Compute V = argmin^g^iAi {||Aau - yl^}- 

• Extend v to form the Dantzig selector /3 on A: 

P{A{i)) = v{i), for / = 1 : |A|, 

P{j) = 0, for j ^ A. 


(5, a, tol G M+, and A = 0.999a/||AH^. 

= /3° = 0, and k = 0. 

A; G N} using Equations (fT0]l and (fT4|l . 


Stage-I of Algorithm [T] terminates once the sequence {(t^,/3^) : k G N} reaches a stationary 
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point. To estimate when this occurs, terminate the iterations when either of the following stopping 
criteria are met: 

1. The relative change between successive terms in the sequence falls below a specihed 

tolerance; 


n 


2 . 


for some e > 0, or 

The support of the sequence {/3^} is stationary for a specified number of successive iterations; 

supp(,5'=) = supp(/3^+^) = • • • = supp(/3*^+’?), 


for a fixed r/ G N and some positive integer k. 

Stage-I is the largest contributor to the computational complexity of Algorithm [H with each 
iteration having complexity 0(4np). In comparison, each outer loop of ADM computing the r 
and 7 -related subproblems has complexity 0(4np), while each inner loop of ADM approximating 
the /3-related subproblem has complexity 0{8np). In general, Algorithm [T] and ADM will use 
a different number of iterations to terminate their iterative stages, so their overall complexities 
cannot be directly compared. However, the numerical experiments in the next section indicate 
that Algorithm [T] tends to have less overall complexity than ADM since Algorithm [T] has a shorter 
runtime even in situations where it requires more iterations. 


3 Numerical Experiments 

In the following experiments, we apply the proposed proximity operator based approach presented 
in Algorithm [1] and the alternating direction method (ADM) presented in [18] to solve the Dantzig 
selector problem m using both synthetic and real data sets. The experiments using synthetic 
data are performed in MATLAB R2013a on single nodes of the Condor Supercomputer, hosted at 
AFRL/RIT Affiliated Resource Center. The full capabilities of Condor were not taken advantage 
of; we ran the algorithms in serial using single nodes to emulate a typical high end consumer 
workstation. Each utilized node is equipped with an Intel Xeon X5650 6 core CPU, with 2.67 GHz 
and 6x8 GB RAM. The experiments using the real data set are performed in MATLAB R2014a 
on a PC with an Intel Core i7-3630QM 2.40 GHz processor and 16 GB RAM running Windows 7 
Enterprise. 

Example 3.1 Synthetic Data Set 

In this series of simulations, sparse coefficient vectors are generated then recovered from noisy 
random linear observations using both Algorithm [1] and ADM. The parameters used are n = 
720m, p = 2560m and s = 80m for m G {2,3,..., 10}, and a G {0.01, 0.05, 0.10, 0.15} corre¬ 
sponding to 1%, 5%, 10% and 15% noise levels. For each combination of m and a, 100 simulations 
each of Algorithm [1] and ADM are performed. All other parameters for ADM are selected following 
the guidelines in [181 Section 3] and the parameters selected for the initialization stage of Algo¬ 
rithm [1] are tol = 2a, a = 0.2||A||2 and 6 = a\J2 logp. The parameters for the stopping criteria 
are e = 10“^ and r\ = max{[41og(a) log(cj) + 2a\ , 5}. The parameters tol, 6 and rj depend on 
the noise level a, which in practice may not be known a priori. However, the noise level may 
be well-approximated using existing methods. In the event that the noise level is not accurately 
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approximated, the speed of convergence of Algorithm [T] will be affected, but the accuracy should 
not suffer much. The stopping criteria 

The n X p sensing matrices X are generated for each simulation with independent Gaussian 
entries normalized so each column has unit £2 norm. To generate the coefficient vector, for each 
simulation a support set S of size \S\ = s is selected uniformly at random. Then the vector /3 with 
indices in S is defined according to /3s(i) = 6^(1 + |oi|), where {aj} is a collection of independently 
and identically distributed random variables sampled from the standard normal distribution and 
{si} is a collection of independently and identically distributed random variables sampled from the 
uniform distribution on {—1,1}. For i ^ S, set f3i = 0. Then Algorithm [1] and ADM are used 
to approximate the Dantzig selector j3 from the observations y = Xf5 + z, where z is a collection 
of independent and identically distributed random variables sampled from the normal distribution 
with mean zero and standard deviation a. 

Algorithm 1 without Postprocessing Algorithm 1 with Postprocessing 





Figure 3.1: A demonstration of the accuracy of the Dantzig selector recovered using Algor ithm[T] and 
ADM with and without postprocessing for a single simulation of Experiment 3.1 with parameters 
0 - = 0.05 with (n,p,s) = (720,2560,80). 


The accuracy of the Dantzig selector recovered in the simulations is measured by 

/ o \ 1/2 

( W-HWl \ 

p ■= 


X;^^imin{^2 ^2} 


(15) 


where (3 denotes the true parameter and (5 denotes the parameter recovered using either Algorithm [T] 
or ADM. The denominator term of Equation (I15p is the expected mean squared-error of the ideal 




































estimator [^. Therefore, p > 0, and a smaller p implies a more accurate estimator. 


Accuracy (o = 0.01) Accuracy (o = 0.05) 




m 


Accuracy (o = 0.10) 



m 

Accuracy (o = 0.15) 



Figure 3.2: A comparison of p, computed as in Equation (1151) which measures the accuracy of 
the approximated Dantzig selectors, for Algorithm [1] and ADM for noise levels a = 0.01,0.05,0.10 
and 0.15 in Example 3.1. In each plot, the points along the curve represent the mean number of 
iterations required for each parameter m over 100 simulations, and the points on the vertical lines 
represent one standard deviation away from the means. 


The effects of Stage-II of Algorithm [T] and the postprocessing step of ADM are illustrated in 
Eigure [3Tl The figure displays values of the exact simulated vector j3 and of the Dantzig selector 
j3 approximated by each algorithm, first without performing postprocessing (the left column of 
Eigure 13.1|) and then with postprocessing (the right column of Eigure 13.ip for one simulation 
with parameters {n,p,s) = (720,2560,80) and noise a = 0.05. One can clearly see that the 
postprocessing not only corrects the underestimated magnitudes of nonzero components of the 
estimates, but also eliminates unwanted nonzero components. 

The results of the above simulations suggest that Algorithm [T] has less overall complexity than 
ADM, since the accuracy of the Dantzig selectors approximated by each method are similar yet 
Algorithm [T] completes much faster than ADM, even when requiring more iterations. Eigure 13.21 
displays the mean and standard deviation of p over 100 simulations for each parameter m and a 
and for both Algorithm [T] and ADM. Note that the accuracy of the Dantzig selector approximated 
by the two algorithms are very similar across all parameter levels. Eigure 13.31 displays the mean 
and standard deviation of the CPU time, and Eigure [ST] displays the mean and standard deviation 
of the total number of iterations performed by Algorithm [T] and the total number of iterations 
performed in the inner loop of ADM for 100 simulations for each parameter m and a. Erom the 
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CPU Time (a = 0.01) CPU Time (o = 0.05) 




CPU Time (o = 0.10) CPU Time (c = 0.15) 




Figure 3.3: A comparison of the CPU time required to recover the Dantzig selector using Algo- 
rithm[T]and ADM for noise levels a = 0.01,0.05,0.10 and 0.15 as in Example 3.1. In each plot, the 
points along the curve represent the mean number of iterations required for each parameter m over 
100 simulations, and the points on the vertical lines represent one standard deviation away from 
the means. 


figures, one can see that although Algorithm [T] requires more iterations than ADM, Algorithm [T] 
completes significantly faster. 


Example 3.2 Leukemia Data Set 


In this experiment, the Dantzig selectors produced by Algorithm [T] and by ADM are used with 
a collection of biomarker data to indicate whether a patient may be diagnosed with a specific type 
of cancer. The biomarker dataset, first introduced in [T3] and studied in [271 [28], contains the 
measurements of 7128 genes related to leukemia diagnoses. The dataset is split into a training set 
and a testing set. The training set is sampled from 38 patients, 27 of whom were diagnosed with 
acute lymphocytic leukemia (ALL) and 11 with acute mylogenous leukemia (AML). The testing 
set is sampled from 34 patients, 20 diagnosed with ALL and 14 with AML. 

Let Atrain £ ]^38x7i28 ^Qj^taiu the biomarker data in the training set, where each row is all 7128 
gene measurements of a single patient and each column has been normalized to have unit (.2 norm. 
Let ytrain G 1^^* be the column vector indicating the diagnosis of each patient in the training set: 


ytrain (j ) 


0, if patient j in the training set is diagnosed with ALL, 
1, if patient j in the training set is diagnosed with AML. 
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Iterations (o = 0.01) 



Iterations (o = 0.10) 



Iterations (a = 0.05) 



Iterations (o = 0.15) 



Figure 3.4: A comparison of the number of iterations required to recover the Dantzig selector using 
Algorithm [1] and ADM for noise levels cr = 0.01,0.05,0.10 and 0.15 in Example 3.1. In each plot, 
the points along the curve represent the mean number of iterations required for each parameter m 
over 100 simulations, and the points on the vertical lines represent one standard deviation away 
from the means. 


Similarly define Atest £ ]^34x7i28 ^ ]g34 data in the testing set. 

This experiment has a training phase and a testing phase. In the training phase, a sparse vector 
j3 is found such that Atrain/S = ytrain- To preprocess the data, only the biomarkers with the largest 
variance are used to train the parameter (3. To this end, select a positive integer N, and let A be 
the N indices of columns from Atrain with largest variance. Let Atrain G be the submatrix 

of Atrain with columns in A. Form the reduced problem 


3a G argmin^gjj^ | ||/3||l : A^ain (Atrain/3 - 2/tram) ^ ^ 


(16) 


The Dantzig selector /3 a G 1^'^ satisfying problem (|16D is computed using Algorithm [T] and ADM, 
then extended to form /3 G via 


3(A(j)) = Mj), for j = 1: N, 
I3ik) = 0, if fe ^ A. 


In the testing phase, the trained parameter /3 is used to predict the diagnoses of patients in the 
testing set. The predictive indicator vector ytest G is computed from y = Atest/^ by thresholding 
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and clustering values near the threshold boundary. Set 


2/test (J ) 


0, if y{j) < 0.49, 
1, if0.51<y(j). 


Let Vo = max{t/(j) : y{j) < 0.49} and yi 
0.49 < y{j) < 0.51, set 


2/test (j ) 


0 , 

1 , 


= mm{y{j) : 0.51 < y(j)}. 

if \y{j) - 2/ol < \y{j) - yi\, 
if \y{j) - 2/il < \y{j) - yo\- 


For values of j such that 


The patient in the testing set is predicted to have a diagnosis of ALL if ytestU) = 0 and a 
diagnosis of AML if ytest(j) = 1- 

The above procedure was used to predict the diagnoses of patients in the testing set using 
the Dantzig selector /3 a computed using both Algorithm [T] and ADM with parameters N = 1000, 
a = Atrainll? and tol = 0.1 and stopping criteria parameters rj = 80 and e = 10“^ for each S 

in {0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375}. Figure 1531 displays the results of these simulations 
regarding the accuracy of the recovered indicator vector ytest in predicting the leukemia diagnoses of 
patients in the testing set, as well as the number of iterations and CPU runtime used by Algorithm[T] 
and ADM. As shown in Figure[33}a), Algorithm [1] typically predicted the diagnoses of patients with 
higher acuracy than ADM. Moreover, for each parameter 5, Algorithm [T] used fewer iterations than 
ADM and the time used by Algorithm [1] was several orders of magnitude less than the time used 
by ADM, as shown in Figures 13.51 0) and (d). Figure [T5f bi illustrates the tendency of Algorithm [T] 
to predict the diagnosis of patients in the testing set with higher accuracy than ADM. This plot 
displays the values of y = Xtest/3 recovered using Algorithm [1] and by ADM prior to the thresholding 
step, along with the true values of //test- Since the values recovered by Algorithm [D tend to be more 
spread out, it is easier to accurately separate them into two distinct clusters. 


4 Conclusion 

In this paper, we have developed an iterative algorithm to compute the Dantzig selector, the solution 
to the minimization problem in problem ([2]). The algorithm is based on the proximity operator 
and its relationship to problem ([7|). The two-stage algorithm we proposed is an improvement over 
some other recently proposed methods to hnd the Dantzig selector, which require the use of inner 
loop to estimate parameters within each step of the algorithm. Additionally, our proposed method 
uses a novel stopping criterion based upon the support of the approximated parameters. 

We compare the proposed algorithm to the alternating direction method proposed in [T8] . 
Theoretically, two methods produce results of similar quality, however each iteration of Stage-I 
of Algorithm [1] has less computational complexity than each iteration of the inner loop of the 
alternating direction method. The numerical experiments demonstrate that the proposed method 
and the alternating direction method typically approximate the Dantzig selectors with similar 
accuracy, yet Algorithm [1] produces results in significantly less time, whether it uses more iterations 
than the alternating direction method, as in Experiment 3.1, or fewer iterations than the alternating 
direction method, as in Experiment 3.2. 
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Number of Misdiagnosed Patients 


Predicted indicator 



(a) The number of patients in the testing set 
misdiagnosed by the predicted indicator vec¬ 
tor recovered by Algorithm 1 and ADM for 
various values of the parameter <5. 


Number of iterations 



(b) Values of the actual diagnosis indicator 
vector 2/test along with values of the predicted 
indicator vectors recovered by Algorithm 1 
and ADM prior to separating values into clas¬ 
sification groups for d = 0.25. 


CPU Time 



(c) The number of iterations required by Al¬ 
gorithm [T] and ADM to recover the Dantzig 
selector /3 a for various values of the parame¬ 
ter S. 


(d) The CPU runtime required by Algo¬ 
rithm [T] and ADM to recover the Dantzig se¬ 
lector /3a for various values of the parameter 
(5. 


Figure 3.5: Plots regarding the indicator vector, used to predict a leukemia diagnosis in patients 
in the testing set as in Example 3.2, using both Algorithm [1] and ADM. 
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the Dantzig selector using the alternating direction method and for sharing the real dataset used 
in Example 3.2. 
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